30 research outputs found

    Learning of a multilingual bitaxonomy of Wikipedia and its application to semantic predicates

    Get PDF
    The ability to extract hypernymy information on a large scale is becoming increasingly important in natural language processing, an area of the artificial intelligence which deals with the processing and understanding of natural language. While initial studies extracted this type of information from textual corpora by means of lexico-syntactic patterns, over time researchers moved to alternative, more structured sources of knowledge, such as Wikipedia. After the first attempts to extract is-a information fromWikipedia categories, a full line of research gave birth to numerous knowledge bases containing information which, however, is either incomplete or irremediably bound to English. To this end we put forward MultiWiBi, the first approach to the construction of a multilingual bitaxonomy which exploits the inner connection between Wikipedia pages and Wikipedia categories to induce a wide-coverage and fine-grained integrated taxonomy. A series of experiments show state-of-the-art results against all the available taxonomic resources available in the literature, also with respect to two novel measures of comparison. Another dimension where existing resources usually fall short is their degree of multilingualism. While knowledge is typically language agnostic, currently resources are able to extract relevant information only in languages providing highquality tools. In contrast, MultiWiBi does not leave any language behind: we show how to taxonomize Wikipedia in an arbitrary language and in a way that is fully independent of additional resources. At the core of our approach lies, in fact, the idea that the English version of Wikipedia can be linguistically exploited as a pivot to project the taxonomic information extracted from English to any other Wikipedia language in order to have a bitaxonomy in a second, arbitrary language; as a result, not only concepts which have an English equivalent are covered, but also those concepts which are not lexicalized in the source language. We also present the impact of having the taxonomized encyclopedic knowledge offered by MultiWiBi embedded into a semantic model of predicates (SPred) which crucially leverages Wikipedia to generalize collections of related noun phrases to infer a probability distribution over expected semantic classes. We applied SPred to a word sense disambiguation task and show that, when MultiWiBi is plugged in to replace an internal component, SPred’s generalization power increases as well as its precision and recall. Finally, we also published MultiWiBi as linked data, a paradigm which fosters interoperability and interconnection among resources and tools through the publication of data on the Web, and developed a public interface which lets the users navigate through MultiWiBi’s taxonomic structure in a graphical, captivating manner

    Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines

    Get PDF
    Background: Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events. Results: In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest. Conclusions: We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets

    Two is bigger (and better) than one: the wikipedia bitaxonomy project.

    Get PDF
    Abstract We present WiBi, an approach to the automatic creation of a bitaxonomy for Wikipedia, that is, an integrated taxonomy of Wikipage pages and categories. We leverage the information available in either one of the taxonomies to reinforce the creation of the other taxonomy. Our experiments show higher quality and coverage than state-of-the-art resources like DBpedia, YAGO, MENTA, WikiNet and WikiTaxonomy. WiBi is available at http://wibitaxonomy.org

    Language resources and linked data: a practical perspective

    Full text link
    Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper

    Missense mutations of NCPAG gene affect calving ease in Piedmontese cattle: preliminary evidences

    Get PDF
    A previous genome scan on 323 Piedmontese individuals identified a cluster of 13 SNPs significantly associated with direct calving ease and centred on the three genes LAP3, LCORL and NCAPG in chromosome 6. We investigated missense mutations affecting calving ease in Piedmontese cattle in the identified region using sequences from the whole exome in eight Piedmontese individuals chosen from the extremes of the direct calving ease estimated breeding values distribution for this trait. The present study has not found missense variants in LAP3 and LCORL, while two were identified on NCAPG by three different variant calling methods. Other gene candidates in the same region harbour missense mutations, such as PPM1K, PKD2, SPP1 and MEPE, but both SIFT analysis and chi-square test on frequency of alleles make us hypothesise that NCAPG is the single gene responsible for the trait variation. The two SNPs on NCAPG are in complete linkage disequilibrium in our samples; therefore, further investigations are needed in order to discriminate their role

    REDIportal: millions of novel A-to-I RNA editing events from thousands of RNAseq experiments

    Get PDF
    RNA editing is a relevant epitranscriptome phenomenon able to increase the transcriptome and proteome diversity of eukaryotic organisms. ADAR mediated RNA editing is widespread in humans in which millions of A-to-I changes modify thousands of primary transcripts. RNA editing has pivotal roles in the regulation of gene expression or modulation of the innate immune response or functioning of several neurotransmitter receptors. Massive transcriptome sequencing has fostered the research in this field. Nonetheless, different aspects of the RNA editing biology are still unknown and need to be elucidated. To support the study of A-to-I RNA editing we have updated our REDIportal catalogue raising its content to about 16 millions of events detected in 9642 human RNAseq samples from the GTEx project by using a dedicated pipeline based on the HPC version of the REDItools software. REDIportal now allows searches at sample level, provides overviews of RNA editing profiles per each RNAseq experiment, implements a Gene View module to look at individual events in their genic context and hosts the CLAIRE database. Starting from this novel version, REDIportal will start collecting non-human RNA editing changes for comparative genomics investigations. The database is freely available at http://srv00.recas.ba.infn.it/atlas/index.html

    Decreased expression of Klotho in cardiac atria biopsy samples from patients at higher risk of atherosclerotic cardiovascular disease

    Get PDF
    Background. Klotho proteins (α- and β) are membrane-based circulating proteins that regulate cell metabolism, as well as the lifespan modulating activity of Fibroblast Growth Factors. Recent data has shown that higher plasma circulating Klotho levels reduce cardiovascular risk, suggesting Klotho has a protective role in cardiovascular diseases. However, although so far it has been identified in various organs, it is unknown whether cardiomyocytes express Klotho and Fibroblast Growth Factors(FGFs), and whether high cardiovascular risk could affect cardiac expression of Klotho, FGFs and other molecules. Methods. We selected 20 patients with an estimated 10-year high atherosclerotic cardiovascular disease and 10 age-matched control subjects with an estimated 10-year low risk undergone cardiac surgery for reasons other than coronary artery by-pass. In myocardial biopsies, we evaluated by immuno-histochemistry whether Klotho and FGFs were expressed in cardiomyocytes, and whether higher cardiovascular risk influenced the expression of other molecules involved in endoplasmic reticulum stress, oxidative stress, inflammation and fibrosis. Results. Only cardiomyocytes of patients with a higher cardiovascular risk showed lower expression of Klotho, but higher expressions of FGFs. Furthermore, higher cardiovascular risk was associated with increased expression of oxidative and endoplasmic reticular stress, inflammation and fibrosis. Conclusions. This study showed for the first time that Klotho proteins are expressed in human cardiomyocytes and that cardiac expression of Klotho is down-regulated in higher cardiovascular risk patients, while expression of stress-related molecules were significantly increased

    Klotho expression in cardiomyocytes in patients at a higher atherosclerotic cardiovascular disease risk

    Get PDF
    Klotho proteins (α- and β-Klotho) are transmembrane proteins whose extracellular domain is secreted into blood and urine by ectodomain shedding. As such they behave as circulating proteins that regulate cell metabolism, endothelial function and calcium homeostasis, as well as modulating the lifespan connected activity of Fibroblast Growth Factors (FGFs, mainly 21 and 23) and other molecules (1). Recent data have shown that highest levels of plasma circulating Klotho are associated with a lower cardiovascular risk, thereby suggesting a possible role for Klotho in cardiovascular diseases (2). However, although Klotho has been identified in various organs, including kidney, brain, adipose tissue and intestine, it is unknown whether cardiomyocytes express Klotho and FGFs and, if so, whether high cardiovascular risk can be associated with cardiac expression of Klotho, FGFs and other related molecules. We examined myocardial biopsies from 20 patients with an estimated 10-year atherosclerotic cardiovascular disease (ASCVD) risk in the range of 5% to 7.5% and 10 age-matched control subjects with an estimated 10-year ASCVD risk of <5% (3) undergoing cardiac surgery other than coronary artery by-pass. Both groups of patients were statin naive, had normal hemoglobin A1c, normal coronary arteries on left heart catheterization, and had LDL cholesterol levels between 70 and 189 mg/ DL. Using immunohistochemistry methods, we evaluated Klotho and FGFs expression in human cardiomyocytes, and whether higher ASCVD risk influenced the expression of other molecules involved in endoplasmic reticulum stress (GRP78), oxidative stress (SOD1, NFkB) and inflammation (iNOS, eNOS). Cardiomyocytes of patients with a higher ASCVD risk exhibited lower expression of Klotho, but also higher expression of FGFs, as compared to cardiomyocytes of patients with a reduced ASCVD risk. Furthermore, higher ASCVD risk was associated with significantly increased expression of GRP78, SOD1, NFkB and iNOS (all p<0.05). This study shows for the first time that Klotho proteins are inherently expressed in human cardiomyocytes and also that cardiac expression of Klotho is down regulated in higher ASCVD risk patients, while the expression of FGFs and other stress-related molecules involved in myocyte damage, such as GRP78, SOD1, NFkB and iNOS, is significantly increased. Further studies are warranted to investigate the association of klotho, FGFs and related molecules expression with cardiovascular risk
    corecore